chain-of-thought prompting
Improving Alignment Between Human and Machine Codes: An Empirical Assessment of Prompt Engineering for Construct Identification in Psychology
Anglin, Kylie L., Milan, Stephanie, Hernandez, Brittney, Ventura, Claudia
Due to their architecture and vast pre-training data, large language models (LLMs) demonstrate strong text classification performance. However, LLM output - here, the category assigned to a text - depends heavily on the wording of the prompt. While literature on prompt engineering is expanding, few studies focus on classification tasks, and even fewer address domains like psychology, where constructs have precise, theory-driven definitions that may not be well represented in pre-training data. We present an empirical framework for optimizing LLM performance for identifying constructs in texts via prompt engineering. We experimentally evaluate five prompting strategies --codebook-guided empirical prompt selection, automatic prompt engineering, persona prompting, chain-of-thought reasoning, and explanatory prompting - with zero-shot and few-shot classification. We find that persona, chain-of-thought, and explanations do not fully address performance loss accompanying a badly worded prompt. Instead, the most influential features of a prompt are the construct definition, task framing, and, to a lesser extent, the examples provided. Across three constructs and two models, the classifications most aligned with expert judgments resulted from a few-shot prompt combining codebook-guided empirical prompt selection with automatic prompt engineering. Based on our findings, we recommend that researchers generate and evaluate as many prompt variants as feasible, whether human-crafted, automatically generated, or ideally both, and select prompts and examples based on empirical performance in a training dataset, validating the final approach in a holdout set. This procedure offers a practical, systematic, and theory-driven method for optimizing LLM prompts in settings where alignment with expert judgment is critical.
- North America > United States > Connecticut (0.04)
- Asia > India (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Can we use LLMs to bootstrap reinforcement learning? -- A case study in digital health behavior change
Albers, Nele, de Groot, Esra Cemre Su, Keijsers, Loes, Hillegers, Manon H., Krahmer, Emiel
Personalizing digital applications for health behavior change is a promising route to making them more engaging and effective. This especially holds for approaches that adapt to users and their specific states (e.g., motivation, knowledge, wants) over time. However, developing such approaches requires making many design choices, whose effectiveness is difficult to predict from literature and costly to evaluate in practice. In this work, we explore whether large language models (LLMs) can be used out-of-the-box to generate samples of user interactions that provide useful information for training reinforcement learning models for digital behavior change settings. Using real user data from four large behavior change studies as comparison, we show that LLM-generated samples can be useful in the absence of real data. Comparisons to the samples provided by human raters further show that LLM-generated samples reach the performance of human raters. Additional analyses of different prompting strategies including shorter and longer prompt variants, chain-of-thought prompting, and few-shot prompting show that the relative effectiveness of different strategies depends on both the study and the LLM with also relatively large differences between prompt paraphrases alone. We provide recommendations for how LLM-generated samples can be useful in practice.
- Europe > Austria > Vienna (0.14)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (11 more...)
RubiSCoT: A Framework for AI-Supported Academic Assessment
Fröhlich, Thorsten, Schlippe, Tim
The evaluation of academic theses is a cornerstone of higher education, ensuring rigor and integrity. Traditional methods, though effective, are time-consuming and subject to evaluator variability. This paper presents RubiSCoT, an AI-supported framework designed to enhance thesis evaluation from proposal to final submission. Using advanced natural language processing techniques, including large language models, retrieval-augmented generation, and structured chain-of-thought prompting, RubiSCoT offers a consistent, scalable solution. The framework includes preliminary assessments, multidimensional assessments, content extraction, rubric-based scoring, and detailed reporting. We present the design and implementation of RubiSCoT, discussing its potential to optimize academic assessment processes through consistent, scalable, and transparent evaluation.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)
- Instructional Material (1.00)
- Overview (0.95)
- Research Report > Experimental Study (0.69)
- Education > Educational Technology > Educational Software > Computer-Aided Assessment (1.00)
- Education > Educational Setting (1.00)
- Education > Assessment & Standards (1.00)
- North America > United States > Ohio (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Cook Islands (0.04)
- (5 more...)
- North America > United States > Pennsylvania (0.04)
- Asia > Vietnam (0.04)
- North America > Mexico (0.04)
- Media (0.93)
- Consumer Products & Services (0.67)
- Leisure & Entertainment > Sports > Basketball (0.46)
- Leisure & Entertainment > Sports > Football (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.27)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- (30 more...)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- (2 more...)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- (7 more...)
- North America > United States > Ohio (0.04)
- Oceania > New Zealand (0.04)
- Oceania > Cook Islands (0.04)
- (5 more...)
- North America > United States > Ohio (0.28)
- Europe > Germany (0.27)
- North America > United States > Texas > Travis County > Austin (0.27)
- (12 more...)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- (3 more...)
- North America > United States > Pennsylvania (0.04)
- Asia > Vietnam (0.04)
- North America > Mexico (0.04)
- Media (0.68)
- Leisure & Entertainment > Sports > Basketball (0.46)
- Leisure & Entertainment > Sports > Football (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)